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Abstract 

Transportation networks play a crucial role in human mobility, the exchange of goods, and the 
spread of invasive species. With 90% of world trade carried by sea, the global network of merchant 
ships provides one of the most important modes of transportation. Here we use information about 
the itineraries of 16,363 cargo ships during the year 2007 to construct a network of links between 
ports. We show that the network has several features which set it apart from other transportation 
networks. In particular, most ships can be classified in three categories: bulk dry carriers, container 
ships and oil tankers. These three categories do not only differ in the ships' physical characteristics, 
but also in their mobility patterns and networks. Container ships follow regularly repeating paths 
whereas bulk dry carriers and oil tankers move less predictably between ports. The network of all 
ship movements possesses a heavy-tailed distribution for the connectivity of ports and for the loads 
transported on the links with systematic differences between ship types. The data analyzed in this 
paper improve current assumptions based on gravity models of ship movements, an important step 
towards understanding patterns of global trade and bioinvasion. 
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I. INTRODUCTION 

The ability to travel, trade commodities, and share information around the world with 
unprecedented efficiency is a defining feature of the modern globalized economy. Among the 
different means of transport, ocean shipping stands out as the most energy efficient mode 
of long-distance transport for large quantities of goods (Rodrigue et al. 2006). According 
to estimates, as much as 90% of world trade is hauled by ships (International Maritime 
Organization 2006). In 2006, 7.4 billion tons of goods were loaded at the world's ports. The 
trade volume currently exceeds 30 trillion ton-miles and is growing at a rate faster than the 
global economy (United Nations conference on trade and development 2007). 

The worldwide maritime network also plays a crucial role in today's spread of invasive 
species. Two major pathways for marine bioinvasion are discharged water from ships' ballast 
tanks (Ruiz et al. 2000) and hull fouling (Drake & Lodge 2007). Even terrestrial species 
such as insects are sometimes inadvertently transported in shipping containers (Lounibos 
2002). In several parts of the world, invasive species have caused dramatic levels of species 
extinction and landscape alteration, thus damaging ecosystems and creating hazards for 
human livelihoods, health, and local economies (Mack et al. 2000). The financial loss due 
to bioinvasion is estimated to be $120 billion per year in the United States alone (Pimentel 
et al. 2005). 

Despite affecting everybody's daily lives, the shipping industry is far less in the public 
eye than other sectors of the global transport infrastructure. Accordingly, it has also re- 
ceived httle attention in the recent hterature on complex networks (Wei et al. 2007, Hu 
& Zhu 2009). This neglect is surprising considering the current interest in networks (Al- 
bert & Barabasi 2002, Newman 2003a, Gross & Blasius 2008), especially airport (Barrat et 
al. 2004, Guimera & Amaral 2004, Hufnagel et al. 2004, Guimera et al. 2005), road (Buhl et 
al. 2006, Barthelemy & Flammini 2008) and train networks (Latora & Marchiori 2002, Sen 
et al. 2003). In the spirit of current network research, we take here a large-scale perspective 
on the global cargo ship network (GCSN) as a complex system defined as the network of 
ports that are connected by links if ship traffic passes between them. 

Similar research in the past had to make strong assumptions about fiows on hypothetical 
networks with connections between all pairs of ports in order to approximate ship move- 
ments (Drake & Lodge 2004, Tatem et al. 2006). By contrast, our analysis is based on 
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comprehensive data of real ship journeys allowing us to construct the actual network. We 
show that it has a small- world topology where the combined cargo capacity of ships calling 
at a given port (measured in gross tonnage) follows a heavy-tailed distribution. This capac- 
ity scales superlinearly with the number of directly connected ports. Wc identify the most 
central ports in the network and find several groups of highly interconnected ports showing 
the importance of regional geopolitical and trading blocks. 

A high-level description of the complete network, however, does not yet fully capture the 
network's complexity. Unlike previously studied transportation networks, the GCSN has a 
multi-layered structure. There are, broadly speaking, three classes of cargo ships - container 
ships, bulk dry carriers, and oil tankers - that span distinct subnetworks. Ships in different 
categories tend to call at different ports and travel in distinct patterns. We analyze the 
trajectories of individual ships in the GCSN and develop techniques to extract quantitative 
information about characteristic movement types. With these methods we can quantify that 
container ships sail along more predictable, frequently repeating routes than oil tankers or 
bulk dry carriers. We compare the empirical data with theoretical traffic flows calculated 
by the gravity model. Simulation results, based on the full GCSN data or the gravity model 
differ significantly in a population-dynamic model for the spread of invasive species between 
the world's ports. Predictions based on the real network are thus more informative for 
international policy decisions concerning the stability of worldwide trade and for reducing 
the risks of bioinvasion. 



II. DATA 



An analysis of global ship movements requires detailed knowledge of ships' arrival and 
departure times at their ports of call. Such data have become available in recent years. 
Starting in 2001, ships and ports have begun installing Automatic Identification System 
(AIS) equipment. AIS transmitters on board of the ships automatically report the arrival and 
departure times to the port authorities. This technology is primarily used to avoid collisions 
and increase port security, but arrival and departure records are also made available by 
Lloyd's Register Fairplay for commercial purposes as part of its Sea- web data base (www. sea- 
web. com). AIS devices have not been installed in all ships and ports yet, and therefore there 
are some gaps in the data. Still, all major ports and the largest ships are included, thus the 
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data base represents the majority of cargo transported on ships. 

Our study is based on Sea-web's arrival and departure records in the calendar year 2007 
as well as Sea- Web's comprehensive data on the ships' physical characteristics. We restrict 
our study to cargo ships bigger than 10, 000 GT (gross tonnage) which make up 93% of 
the world's total capacity for ship cargo transport. From these we select all 16, 363 ships 
for which AIS data are available, taken as representative of the global traffic and long- 
distance trade between the 951 ports equipped with AIS receivers (for details see Electronic 
Supplementary Material). For each ship we obtain a trajectory from the data base, i.e. 
a list of ports visited by the ship sorted by date. In 2007, there were 490, 517 nonstop 
journeys linking 36, 351 distinct pairs of arrival and departure ports. The complete set of 
trajectories, each path representing the shortest route at sea and colored by the number of 
journeys passing through it, is shown in Fig. [1^. 

Each trajectory can be interpreted as a small directed network where the nodes are ports 
linked together if the ship traveled directly between the ports. Larger networks can be 
defined by merging trajectories of different ships. In this article we aggregate trajectories in 
four different ways: the combined network of all available trajectories, and the subnetworks 
of container ships (3 100 ships), bulk dry carriers (5 498) and oil tankers (2 628). These three 
subnetworks combined cover 74% of the GCSN's total gross tonnage. In all four networks, 
we assign a weight Wij to the link from port i to j equal to the sum of the available space 
on all ships that have traveled on the link during 2007 measured in GT. If a ship made the 
journey from i to j more than once, its capacity contributes multiple times to Wij. 

III. THE GLOBAL CARGO SHIP NETWORK 

The directed network of the entire cargo fleet is noticeably asymmetric, with 59% of all 
linked pairs of ports being connected only in one direction. Still, the vast majority of ports 
(935 out of 951) belongs to one single strongly connected component, i.e. for any two ports 
in this component there are routes in both directions, though possibly visiting different 
intermediate ports. The routes are intriguingly short: only few steps in the network are 
needed to get from one port to another. The shortest path length / between two ports is 
the minimum number of nonstop connections one must take to travel between origin and 
destination. In the GCSN, the average over all pairs of ports is extremely small, (Z) = 2.5. 
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Even the maximum shortest path between any two ports (e.g. from Skagway, Alaska, to the 
small Italian island of Lampedusa), is only of length Z^ax = 8- In fact, the majority of all 
possible origin-destination pairs (52%) can already be connected by two steps or less. 

Comparing these findings to those reported for the worldwide airport network (WAN) 
shows interesting differences and similarities. The high asymmetry of the GCSN has not 
been found in the WAN, indicating that ship traffic is structurally very different from avi- 
ation. Rather than being formed by the accumulation of back and forth trips, ship traffic 
seems to be governed by an optimal arrangement of unidirectional, often circular routes. 
This optimality also shows in the GCSN's small shortest path lengths. In comparison, in 
the WAN, the average and maximum shortest path lengths are (Z) = 4.4 and Zmax = 15 
respectively (Guimera et al. 2005), i.e. about twice as long as in the GCSN. Similar to the 
WAN, the GCSN is highly clustered: if a port X is linked to ports Y and Z, there is a 
high probability that there is also a connection from Y to Z. We calculated a clustering 
coefficient C (Watts & Strogatz 1998) for directed networks and find C = 0.49 whereas 
random networks with the same number of nodes and links only yield C = 0.04 on aver- 
age. Degree dependent clustering coefficients Ck reveal that clustering decreases with node 
degree (see Electronic Supplementary Material). Therefore, the GCSN - like the WAN - 
can be regarded as a small-world network possessing short path lengths despite substantial 
clustering (Watts & Strogatz 1998). However, the average degree of the GCSN, i.e. the 
average number of links arriving at and departing from a given port (in- plus out-degree), 
(k) = 76.5, is notably higher than in the WAN where (k) = 19.4 (Barrat et al. 2004). In 
the light of the network size (the WAN consists of 3880 nodes), this difference becomes 
even more pronounced, indicating that the GCSN is much more densely connected. This 
redundancy of links gives the network high structural robustness to the loss of routes for 
keeping up trade. 

The degree distribution P{k) shows that most ports have few connections, but there 
are some ports linked to hundreds of other ports (Fig. [2|i). Similar right-skewed degree 
distributions have been observed in many real- world networks (Barabasi & Albert 1999). 
While the GCSN's degree distribution is not exactly scale-free, the distribution of link 
weights, P{w), follows approximately a power law P{w) oc w~'^ with fi = 1.71 ± 0.14 (95% 
CI for linear regression. Fig. |2]d, see also Electronic Supplementary Material). By averaging 
the sums of the link weights arriving at and departing from port i, we obtain the node 
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strength Sj (Barrat et al. 2004). The strength distribution can also be approximated by a 
power law P{s) oc s^"^ with rj = 1.02 ± 0.17, meaning that a small number of ports handle 
huge amounts of cargo (Fig. [2b). The determination of power law relationships by line fitting 
has been strongly criticised (e.g. Newman 2005, Clauset et al. 2009), therefore we analysed 
the distributions with model selection by Akaike weights (Burnham & Anderson 1998). 
Our results confirm that a power law is a better fit than an exponential or a log-normal 
distribution for P{w) and -P(s), but not P{k) (see Electronic Supplementary Material). 
These findings agree well with the concept of hubs-spokes networks (Notteboom 2004) that 
were proposed for cargo traffic, for example in Asia (Robinson 1998). There are a few 
large, highly connected ports through which all smaller ports transact their trade. This 
scale-free property makes the ship trade network prone to the spreading and persistence 
of bioinvasive organisms (e.g. Pastor-Satorras & Vespignani 2001). The average nearest 
neighbors's degrees, a measure of network assort at ivity, additionally underline the hubs- 
spokes property of cargo ship traffic (see Electronic Supplementary Material). 

Strengths and degrees of the ports are related according to the scaling relation {s{k)) oc 
^i.46±o.i (^95% Qj fQj. gMA regression, Warton et al. 2006). Hence, the strength of a port 
grows generally faster than its degree (Fig. [2li). In other words, highly connected ports 
not only have many links, but their links also have a higher than average weight. This 
observation agrees with the fact that busy ports are better equipped to handle large ships 
with large amounts of cargo. A similar result, {s{k)) oc k^'^"^^'^ , was found for airports 
(Barrat et al. 2004), which may hint at a general pattern in transportation networks. In 
the light of bioinvasion, these results underline empirical findings that big ports are more 
heavily invaded because of increased propagule pressure by ballast water of more and larger 
ships (Mack et al. 2000, Williamson 1996, see e.g. Cohen & Carhon 1998). 

A further indication of the importance of a node is its betweenness centrality (Freeman 
1979, Newman 2004). The betweenness of a port is the number of topologically shortest 
directed paths in the network that pass through this port. In Fig. [T|d we plot and list 
the most central ports. Generally speaking, centrality and degree are strongly correlated 
(Pearson's correlation coefficient: 0.81), but in individual cases other factors can also play a 
role. The Panama and Suez Canal, for instance, are shortcuts to avoid long passages around 
South America and Africa. Other ports have a high centrality because they are visited by 
a large number of ships (e.g. Shanghai) whereas others gain their status primarily by being 
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connected to many different ports (e.g. Antwerp). 

IV. THE NETWORK LAYERS OF DIFFERENT SHIP TYPES 

To compare the movements of cargo ships of different types, separate networks were 
generated for each of the three main ship types: container ships, bulk dry carriers, and oil 
tankers. Applying the network parameters introduced in the previous section to these three 
subnetworks reveals some broad-scale differences (see Tabled]). The network of container 
ships is densely clustered, C = 0.52, has a rather low mean degree, (k) = 32.44, and a large 
mean number of journeys (i.e. number of times any ship passes) per link, (J) = 24.26. The 
bulk dry carrier network, on the other hand, is less clustered, has a higher mean degree, 
and fewer journeys per link (C = 0.43, (k) = 44.61, (J) = 4.65). For the oil tankers, we 
find intermediate values (C = 0.44, {k) = 33.32, (J) = 5.07). Note that the mean degrees 
(k) of the subnetworks are substantially smaller than that of the full GCSN, indicating that 
different ship types use essentially the same ports but different connections. 

A similar tendency appears in the scaling of the link weight distributions (Fig. [2)d). P{w) 
can be approximated as power laws for each network, but with different exponents fi. The 
container ships have the smallest exponent {fi = 1.42) and bulk dry carriers the largest 
(/i = 1.93) with oil tankers in between (/i = 1.73). In contrast, the exponents for the 
distribution of node strength P{s) are nearly identical in all three subnetworks, t] = 1.05, 
7] = 1.13 and f] = 1.01, respectively. 

These numbers give a first indication that different ship types move in distinctive pat- 
terns. Container ships typically follow set schedules visiting several ports in a fixed sequence 
along their way, thus providing regular services. Bulk dry carriers, by contrast, appear less 
predictable as they frequently change their routes on short notice depending on the current 
supply and demand of the goods they carry. The larger variety of origins and destinations 
in the bulk dry carrier network {n = 616 ports, compared to n = 378 for container ships) 
explains the higher average degree and the smaller number of journeys for a given link. Oil 
tankers also follow short-term market trends, but, because they can only load oil and oil 
products, the number of possible destinations {n = 505) is more limited than for bulk dry 
carriers. 

These differences are also underlined by the betweenness centralities of the three network 
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layers (see Electronic Supplementary Material). While some ports rank highly in all cate- 
gories (e.g. Suez Canal, Shanghai), others are specialized on certain ship types. For example, 
the German port of Wilhelmshaven ranks tenth in terms of its world-wide betweenness for 
oil tankers, but is only 241st for bulk dry carriers and 324th for container ships. 

We can gain further insight into the roles of the ports by examining their community 
structure. Communities are groups of ports with many links within the groups but few links 
between different groups. We calculated these communities for the three subnetworks with a 
modularity optimization method for directed networks (Leicht & Newman 2008) and found 
that they differ significantly from modularities of corresponding Erdos-Renyi graphs (Fig. [3l 
Guimera et al. 2004). The network of container trade shows 12 communities (Fig. [3K). 
The largest ones are located (1) on the Arabian, Asian, and South African coasts, (2) on 
the North American east coast and in the Caribbean, (3) in the Mediterranean, the Black 
Sea, and on the European west coast, (4) in Northern Europe, and (5) in the Far East and 
on the American west coast. The transport of bulk dry goods reveals 7 groups (Fig. [Hb). 
Some can be interpreted as geographic entities (e.g. North American east coast, trans- 
Pacific trade) while others are dispersed on multiple continents. Especially interesting is 
the community structure of the oil transportation network which shows 6 groups (Fig. [St): 
(1) the European, north and west African market (2) a large community comprising Asia, 
South Africa and Australia, (3) three groups for the Atlantic market with trade between 
Venezuela, the Gulf of Mexico, the American east coast and Northern Europe, and (4) the 
American Pacific Coast. It should be noted that the network includes the transport of crude 
oil as well as commerce with already refined oil products so that oil producing regions do 
not appear as separate communities. This may be due to the limit in the detectability 
of smaller communities by modularity optimization (Fortunato & Barthelemy 2007), but 
does not affect the relevance of the revealed ship traffic communities. Because of the, by 
definition, higher transport intensity within communities, bioinvasive spread is expected 
to be heavier between ports of the same community. However, in Fig. [3] it becomes clear 
that there are no strict geographical barriers between communities. Thus, spread between 
communities is very likely to occur even on small spatial scales by shipping or ocean currents 
between close-by ports that belong to different communities. 

Despite the differences between the three main cargo fleets, there is one unifying feature: 
their motif distribution (Milo et al. 2002). Like most previous studies, we focus here on 
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the occurrence of three-node motifs and present their normahzed Z score, a measure for 
their abundance in a network (Fig. H]). Strikingly, the three fleets have practically the 
same motif distribution. In fact, the Z scores closely resemble those found in the World 
Wide Web and different social networks which were conjectured to form a superfamily of 
networks (Milo et al. 2004). This superfamily displays many transitive triplet interactions 
(i.e. if X — y and Y Z, then X — )■ Z); for example, the overrepresented motif 13 in 
Fig. m has six such interactions. Intransitive motifs, like motif 6, are comparably infrequent. 
The abundance of transitive interactions in the ship networks indicates that cargo can be 
transported both directly between ports as well as via several intermediate ports. Thus, 
the high clustering and redundancy of links (robustness to link failures) appears not only 
in the GCSN but also in the three subnetworks. The similarity of the motif distributions 
to other humanly optimized networks underlines that cargo trade, like social networks and 
the World Wide Web, depends crucially on human interactions and information exchange. 
While advantageous for the robustness of trade, the clustering of links as triplets also has 
an unwanted side effect: in general, the more clustered a network, the more vulnerable it 
becomes to the global spread of alien species, even for low invasion probabilities (Newman 
2003b). 

V. NETWORK TRAJECTORIES 

Going beyond the network perspective, the data base also provides information about 
the movement characteristics per individual ship (Table H]). The average number of distinct 
ports per ship (A^) does not differ much between different ship classes, but container ships 
call much more frequently at ports than bulk dry carriers and oil tankers. This difference is 
explained by the characteristics and operational mode of these ships. Normally, container 
ships are fast (between 20 and 25 knots) and spend less time (1.9 days on average in our 
data) in the port for cargo operations. By contrast, bulk dry carriers and oil tankers move 
more slowly (between 13 and 17 knots) and stay longer in the ports (on average 5.6 days for 
bulk dry carriers, 4.6 days for oil tankers). 

The speed at sea and of cargo handling, however, is not the only operational difference. 
The topology of the trajectories also differs substantially. Characteristic sample trajectories 
for each ship type are presented in Fig. [5^-c. The container ship (Fig. [5ti) travels on some of 
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the links several times during the study period whereas the bulk dry carrier (Fig. |5hi) passes 
almost every link exactly once. The oil tanker (Fig. [5t) commutes a few times between some 
ports, but by and large also serves most links only once. 

We can express these trends in terms of a "regularity index" p that quantifies how much 
the frequency with which each link is used deviates from a random network. Consider the 
trajectory of a ship calling S times at distinct ports and travelling on L distinct links. 
We compare the mean number of journeys per link f^eai = S/L to the average link usage 
fran cin cusemble of randomized trajectories with the same number of nodes and port 
calls S. To quantify the difference between real and random trajectories we calculate the Z 
score p = {freai — fran)/o' (whcrc (T is the standard deviation of / in the random ensemble). 
If p = 0, the real trajectory is indistinguishable from a random walk, whereas larger values 
of p indicate that the movement is more regular. Figures [Sji-f present the distributions of 
the regularity index p for the different fleets. For container ships, p is distributed broadly 
around p ~ 2, thus supporting our earlier observation that most container ships provide 
regular services between ports along their way. Trajectories of bulk dry carriers and oil 
tankers, on the other hand, appear essentially random with the vast majority of ships near 
p = 0. 

VI. APPROXIMATING TRAFFIC FLOWS USING THE GRAVITY MODEL 

In this article, we view global ship movements as a network based on detailed arrival and 
departure records. Until recently, surveys of seaborne trade had to rely on far less data: 
only the total number of arrivals at some major ports were publicly accessible, but not the 
ships' actual paths (Zachcial & Heideloff 2001). Missing information about the frequency 
of journeys, thus, had to be replaced by plausible assumptions, the gravity model being 
the most popular choice. It posits that trips are, in general, more likely between nearby 
ports than between ports far apart. If dij is the distance between ports i and j, the decline 
in mutual interaction is expressed in terms of a distance deterrence function f{dij). The 
number of journeys from i to j then takes the form Fij = aibj0iljf{dij), where Oj is the total 
number of departures from port i and Ij the number of arrivals at j (Haynes & Fotheringham 
1984). The coefficients and bj are needed to ensure J2j ^ij = J2i ^ij = -bi- 

llow well can the gravity model approximate real ship traffic? We choose a truncated 
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power law for the deterrence function, f{dij) = dij~^ exp{—dij/ k). The strongest correla- 
tion between model and data is obtained for /3 = 0.59 and k, = 4900 km (see Electronic 
Supplementary Material). At first sight, the agreement between data and model appears 
indeed impressive. The predicted distribution of travelled distances (Fig. [6K) fits the data 
far better than a simpler non-spatial model that preserves the total number of journeys, but 
assumes completely random origins and destinations. 

A closer look at the gravity model, however, reveals its limitations. In Fig. [6}d we count 
how often links with an observed number of journeys Nij are predicted to be passed Fij 
times. Ideally all data points would align along the diagonal Fij = Nij, but we find that the 
data are substantially scattered. Although the parameters /3 and k, were chosen to minimize 
the scatter, the correlation between data and model is only moderate (Kendall's r = 0.433). 
In some cases, the prediction is off by several thousand journeys per year. 

Recent studies have used the gravity model to pinpoint the ports and routes central 
to the spread of invasive species (Drake & Lodge 2004, Tatem et al. 2006). The model's 
shortcomings pose the question how reliable such predictions are. For this purpose, we 
investigated a dynamic model of ship-mediated bioinvasion where the weights of the links 
are either the observed traffic flows or the flows of the gravity model. 

We follow previous epidemiological studies (Rvachev & Longini 1985, Flahault et al. 1988, 
Hufnagel et al. 2004, Colizza et al. 2006) in viewing the spread on the network as a metapop- 
ulation process where the population dynamics on the nodes are coupled by transport on 
the links. In our model, ships can transport a surviving population of an invasive species 
with only a small probability ptrans = 1% on each journey between two successively visited 
ports. The transported population is only a tiny fraction s of the population at the port 
of origin. Immediately after arriving at a new port, the species experiences strong demo- 
graphic fluctuations which lead in most cases to the death of the imported population. If 
however the new immigrants beat the odds of this "ecological roulette" (Carlton & Geller 
1993) and establish, the population P grows rapidly following the stochastic logistic equa- 
tion ^ = rP{l — P) + \/P^{t) with growth rate r = 1/year and Gaussian white noise ^. 
For details of the model, we refer to the Electronic Supplementary Material. 

Starting from a single port at carrying capacity P = 1, we model contacts between ports 
as Poisson processes with rates N^j (empirical data) or Fij (gravity model). As shown in 
Fig. [Th-, the gravity model systematically overestimates the spreading rate, and the difference 
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can become particularly pronounced for ports which are well-connected, but not among the 
central hubs in the network (Fig. Wp)- Comparing typical sequences of infected ports, we 
find that the invasions driven by the real traffic flows tend to be initially confined to smaller 
regional ports, whereas in the gravity model the invasions quickly reach the hubs. The total 
out- and in-fiows at the ship journeys' origin and departure ports, respectively, are indeed 
more strongly positively correlated in reality than in the model (r = 0.157 vs. 0.047). 
The gravity model thus erases too many details of a hierarchical structure present in the 
real network. That the gravity model eliminates most correlations, is also plausible from 
simple analytic arguments, see Electronic Supplementary Material for details. The absence 
of strong correlations makes the gravity model a suitable null hypothesis if the correlations 
in the real network are unknown, but several recent studies have shown that correlations 
play an important role in spreading processes on networks (e.g. Newman 2002, Boguna & 
Pastor-Satorras 2002). Hence, if the correlations are known, they should not be ignored. 

While we observed that the spreading rates for the AIS data were consistently slower than 
for the gravity model even when different parameters or population models were considered, 
the time scale of the invasion is much less predictable. The assumption that only a small 
fraction of invaders succeed outside their native habitat appears realistic (Mack et al. 2000). 
Furthermore, the parameters in our model were adjusted so that the per-ship-call probabil- 
ity of initiating invasion is approximately 4.4 ■ 10~^, a rule-of-thumb value stated by Drake 
& Lodge (2004). Still, too little is empirically known to pin down individual parameters 
with sufficient accuracy to give more than a qualitative impression. It is especially difficult 
to predict how a potential invader reacts to the environmental conditions at a specific lo- 
cation. Growth rates certainly differ greatly between ports depending on factors such as 
temperature or salinity, with respect to the habitat requirements of the invading organisms. 
Our results should, therefore, be regarded as one of many different conceivable scenarios. A 
more detailed study of bioinvasion risks based on the GCSN is currently underway (Seebens 
& Blasius 2009). 

VII. CONCLUSION 

We have presented a study of ship movements based on AIS records. Viewing the ports as 
nodes in a network linked by ship journeys, we found that global cargo shipping, like many 
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Ship class ships MGT n (k) C (l) (J) /x -q (N) (L) (S) {p) 

Whole fleet 16363 664.7 951 76.4 0.49 2.5 13.57 1.71 1.02 10.4 15.6 31.8 0.63 

Container ships 3100 116.8 378 32.4 0.52 2.76 24.25 1.42 1.05 11.2 21.2 48.9 1.84 

Bulk dry carriers 5498 196.8 616 44.6 0.43 2.57 4.65 1.93 1.13 8.9 10.4 12.2 0.03 

Oil tankers 2628 178.4 505 33.3 0.44 2.74 5.07 1.73 1.01 9.2 12.9 17.7 0.19 



TABLE I: Characterization of different subnetworks. Number of ships, total gross tonnage [10^ 
GT] and number of ports n in each subnetwork; together with network characteristics: mean degree 
(k), clustering coefficient C, mean shortest path length (1), mean journeys per link (J), power-law 
exponents jj. and ry; and trajectory properties: average number of distinct ports (N), links (L), 
port calls {S) per ship and regularity index {p). Some notable values are highlighted in bold. 

other complex networks investigated in recent years, possesses the small world property as 
well as broad degree and weight distributions. Other features, like the importance of canals 
and the grouping of ports into regional clusters, are more specific to the shipping industry. 
An important characteristic of the network are differences in the movement patterns of 
different ship types. Bulk dry carriers and oil tankers tend to move in a less regular manner 
between ports than container ships. This is an important result regarding the spread of 
invasive species because bulk dry carriers and oil tankers often sail empty and therefore 
exchange large quantities of ballast water. The gravity model, which is the traditional 
approach to forecasting marine biological invasions, captures some broad trends of global 
cargo trade, but for many applications its results are too crude. Future strategies to curb 
invasions will have to take more details into account. The network structure presented in 
this article can be viewed as a first step in this direction. 
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FIG. 1: Routes, ports and betweenness centralities in the global cargo ship network (GCSN). (a) 
The trajectories of all cargo ships bigger than 10, 000 GT during 2007. The color scale indicates 
the number of journeys along each route. Ships are assumed to travel along the shortest (geodesic) 
paths on water, (b) A map of the 50 ports of highest betweenness centrality and a ranked list of 
the 20 most central ports. 
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FIG. 2: Degrees and weights in the global cargo ship network * (insets: subnetworks for container 
ships □, bulk dry carriers o, and oil tankers a), (a) The degree distributions P{k) are right-skewed, 
but not power laws, neither for the GCSN nor its subnetworks. The degree k is defined here as 
the sum of in- and out-degree, thus /c = 1 is rather rare, (b) The link weight distributions P{w) 
reveal clear power law relationships for the GCSN and the three subnetworks, with exponents 
fj, characteristic for the movement patterns of the different ship types, (c) The node strength 
distributions P{s) are also heavy-tailed, showing power law relationships. The stated exponents 
are calculated by linear regression with 95% confidence intervals (similar results are obtained with 
maximum likelihood estimates, see Electronic Supplementary Material), (d) The average strength 
of a node {s{k)) scales superlinearly with its degree, {s{k)) oc ^i-46±o.i^ indicating that highly 
connected ports have, on average, links of higher weight. 
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FIG. 3: Communities of ports in three cargo ship subnetworks. The communities are groups of 
ports that maximize the number of hnks within the groups, as opposed to between the groups, 
in terms of the modularity Q (Leicht & Newman 2008). In each map, the colors represent the 
c distinct trading communities for the goods transported by (a) container ships, (b) bulk dry 
carriers, and (c) oil tankers. The optimal values for c and Q are stated in the lower right corners. 
All modularities Q of the examined networks differ significantly from modularities in Erdos-Renyi 
graphs of the same size and number of links (Guimera et al. 2004). For the networks corresponding 
to (a), (b) and (c) values are Qer = 0.219, Qer = 0.182 and Qer = 0.220, respectively. 
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FIG. 4: Motif distributions of the three main cargo fleets. A positive (negative) normahzed Z score 
indicates that a motif is more (less) frequent in the real network than in random networks with 
the same degree sequence. For comparison, we overlay the Z scores of the World Wide Web and 
social networks. The agreement suggests that the ship networks fall in the same superfamily of 
networks (Milo et al. 2004). The motif distributions of the fleets are maintained even when 25%, 
50% and 75% of the weakest connections are removed. 



The global cargo ship network 



22 




FIG. 5: Sample trajectories of (a) a container ship with a regularity index p = 2.09, (b) a bulk dry 
carrier, p = 0.098, (c) an oil tanker, p = 1.027. In the three trajectories, the numbers and the line 
thickness indicate the frequency of journeys on each link, (d)-(f) Distribution of p for the three 
main fleets. 
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FIG. 6: (a) Histogram of port-to-port distances travelled in the GCSN (navigable distances around 
continents as indicated in Fig. [T|). We overlay the predictions of two different models. The gravity 
model (red), based on information about distances between ports and total port calls, gives a much 
better fit than a simpler model (blue) which only fixes the total number of journeys, (b) Count of 
port pairs with Nij observed and Fij predicted journeys. The fiows Fij were calculated with the 
gravity model (rounded to the nearest integer). Some of the worst outliers are highlighted in blue, 
o: Antwerp to Calais {Nij = vs. Fij = 200). A: Hook of Holland to Europoort (16 vs. 1895). o: 
Calais to Dover (4392 vs. 443). □: Harwich to Hook of Holland (644 vs. 0). 
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FIG. 7: Results from a stochastic population model for the spread of an invasive species between 
ports, (a) The invasion starts from one single, randomly chosen port, (b) The initial port is fixed 
as Bergen (Norway), an example of a well-connected port (degree k = 49) which is not among the 
central hubs. The rate of journeys from port i to j per year is assumed to be Nij (real flows from the 
GCSN) or Fij (gravity model). Each journey has a small probability of transporting a tiny fraction 
of the population from origin to destination. Parameters were adjusted (r = 1/year, ptrans = 0.01, 
s = 4 - 10~^) to yield a per-ship-call probability of initiating invasion of w 4.4-10"^ (Drake &i Lodge 
2004, see Electronic Supplementary Material for details). Plotted are the cumulative numbers of 
invaded ports (population number larger than half the carrying capacity) averaged over (a) 14, 000, 
(b) 1000 simulation runs (standard error equal to line thickness). 



